Genome‐wide virus‐integration analysis reveals a common insertional mechanism of HPV, HBV and EBV

Dear Editor, Human papillomavirus (HPV), hepatitis B virus (HBV) and Epstein–Barr virus (EBV) are the three most oncogenic DNA viruses, contributing to 15 different types of cancer.1 Although these viruses differ in many aspects, one common key step is the integration of their DNA into the human genome, which could potentially promote carcinogenesis.2–4 In this study, we developed and performed a novel pipeline (Figures S1–S8, Supplementary Notes 1–3 and Table S1) named viral integration pathway analysis (VIPA) to elucidate the integration mechanism shared by HPV, HBV and EBV, thus gaining a deeper understanding towards the virus-induced carcinogenesis and the corresponding anticancer therapies. First, we conducted HPV capture sequencing and identified 1002 HPV integration breakpoints in 24.8% (225/910) non-cancer HPV infection samples, 588 breakpoints in 38.0% (125/329) cervical precancer samples and 1597 breakpoints in 69.0% (158/227) cancer samples (Figure 1A). The total integration sample proportion was 34.7% (508/1466), and the average integration breakpoints were 6.27 per sample. We observed 24 recurrent integration hotspots (integration positions located within the 500-kb downstream/ upstream of the gene, n ≥ 5) in our dataset (Figure 1A). Among them, 10 integration hotspots were previously reported, and 14 HPV integration hotspot genes were newly identified (Table S2). Next, we found that the distribution of HPV integration strains and status in non-cancer HPV infection, cervical precancer and cancer sampleswere different (Figure 1B,C). Specifically, HPV16 integration percentage was only 10% (ranked third) in non-cancer samples but increased to 33.4% (ranked first) in precancer and 55.5% (ranked first) in cancer samples. HPV18 integration percentage was only 3.1% in non-cancer samples, and 5.8% in precancer samples, and rose to 7.9% (ranked second) in cancer samples.

Dear Editor, Human papillomavirus (HPV), hepatitis B virus (HBV) and Epstein-Barr virus (EBV) are the three most oncogenic DNA viruses, contributing to 15 different types of cancer. 1 Although these viruses differ in many aspects, one common key step is the integration of their DNA into the human genome, which could potentially promote carcinogenesis. [2][3][4] In this study, we developed and performed a novel pipeline (Figures S1-S8, Supplementary Notes 1-3 and Table S1) named viral integration pathway analysis (VIPA) to elucidate the integration mechanism shared by HPV, HBV and EBV, thus gaining a deeper understanding towards the virus-induced carcinogenesis and the corresponding anticancer therapies.
Next, we found that the distribution of HPV integration strains and status in non-cancer HPV infection, cervical precancer and cancer samples were different ( Figure 1B,C). Specifically, HPV16 integration percentage was only 10% (ranked third) in non-cancer samples but increased to 33.4% (ranked first) in precancer and 55.5% (ranked first) in cancer samples. HPV18 integration percentage was only 3.1% in non-cancer samples, and 5.8% in precancer samples, and rose to 7.9% (ranked second) in cancer samples.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. The average integration events for non-cancer infection were 4.4, for cervical precancer were 4.7 and for cancer samples were 10.1, indicating that HPV integration increased along with the disease progression (non-cancer vs. precancer, p = .011; precancer vs. cancer, p < .0001; Wilcox test, False Discovery Rate corrected) and may serve as an early warning biomarker of carcinogenesis ( Figure 1C). When applying the average integration events to predict clinical outcomes, the results showed that we could distinguish high-grade squamous intraepithelial lesion (HSIL)± (including HSIL and Cancer) with an AUC of .722. Further, we found that HPV16 held best prediction performance towards HSIL± with the AUC of .859. Similarly, HPV18 shared comparable prediction performance towards HSIL± with the AUC of .819 ( Figure 1D).
Further, motivated by the aim of finding common integration features among HPV, HBV and EBV, we collected the capture sequencing data of the three viruses. Together, we detected 4390 integration breakpoints for HPV, 4010 integration breakpoints for HBV and 174 integration breakpoints for EBV (Tables S3-S5). Intriguingly, 21 integration genes were shared by all three viruses (Table S6), indicating the potential roles of these genomic loci in oncogenic viruses-related cancers.
Next, we explored the viral integration patterns using identified human-viral junctional sequences (defined by ≥30-bp human and viral sequences at the integration sites) from expanded integration datasets (Table S7 and Supplementary Notes 4 and 5). Previous studies have indicated that the integrations of three viruses were mediated by microhomology (MH) 4-7 ( Figure S9). However, it is not clear how the lateral microhomologies (defined as microhomologies with short-distance from the junction sites) mediate the integration process (Figure 2A-C). Inspired by the new understandings towards alternative end-joining, 8,9 we speculated that synthesis-dependent  Figures S11 and S12).
We analysed the roles of SD-EJ using computational simulation ( Figure S13) in 4341 human-HPV junctional sequences (Table S3), 4010 human-HBV junctional sequences (Table S4) (Table S5). We found that SD-EJ was significantly enriched for all three viruses ( Figure 3A).
Then, the repair models and products of SD-EJ were further analysed ( Figure 3B). The proportions of loopout model were 47.9%-61.4% (HPV: 61.4%; HBV: 57.7% and EBV: 47.9%), whereas those of snap-backs were 38.8%-52.1% (HPV: 38.8%; HBV: 42.3% and EBV: 52.1%). For repair products, junctional MH was the major type, accounting for 89.5% HPV, 91.3% HBV and 88.1% EBV SD-EJ integration events, followed by apparent blunt join (HPV: 8.4%; HBV: 7.9% and EBV: 10.4%) and short insertion (HPV: 2.0%; HBV: .8% and EBV: 1.5%). The occurrence of junctional MH was significantly higher in the observed group than that in the expected group ( Figure 3C, Supplementary Note 6). Conversely, the occurrence of apparent blunt join was significantly lower in the observed group than in the expected group. Of note, the significant enrichment of short insertion was observed in HPV and HBV datasets, whereas there was no significant difference of short insertion between EBV's observed and expected groups (n = 1 vs. n = .14, p = 1, Fisher's exact test) due to relatively small dataset ( Figure 3C, Supplementary Note 6).
Finally, we classified integration pathways of each dsDNA virus breakpoint into three categories: (i) SD-EJ pathway with SD-EJ structures, followed by (ii) other alt-EJ pathway with microhomologies overhangs and otherwise (iii) NHEJ pathway without the previous two signatures ( Figure 3D). In 10-bp flanking length, we observed the percentages of SD-EJ pathway were 59.11% for HPV, 65.04% for HBV and 48.38% for EBV, whereas those of unclassified NHEJs were 37.15% for HPV, 28.29% for HBV and 48.55% for EBV ( Figure 3E). The previous data suggested that SD-EJ repair pathway may play an important role in the integrations of three viruses into human genome. Together, we report the largest genome-wide landscape of HPV, HBV and EBV insertional mutageneses. We uncovered HPV, HBV and EBV to share the same common SD-EJ integration mechanism. Based on our identified integration patterns and the biology features of three viruses, we proposed a new model of the integration process of HPV, HBV and EBV (Figure 4), providing insights into virus-induced cancer.

A C K N O W L E D G E M E N T S
We thank the Tianhe Supercomputer Center for computational support and GeneRulor for probe design and partial experiment.